Goto

Collaborating Authors

 cv ar






On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Neural Information Processing Systems

Risk-averse reinforcement learning (RL) seeks to provide a risk-averse policy for high-stakes real-world decision problems. These high-stake domains include autonomous driving (Jin et al., 2019; Sharma et al., 2020), robot collision avoidance (Ahmadi et al., 2021; Hakobyan and Y ang, 2021),






Reviewer 1 - Use of mini-batches: in our experiments, we indeed use mini-batches of size B, by sampling B points

Neural Information Processing Systems

We would like to thank all reviewers for their valuable feedback and comments. Please find our responses below. This is because it predicts an almost uniform distribution. AdaCV aR also has a lower CV aR than ERM (standard SGD). Thank you for observing that.